home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Skunkware 5
/
Skunkware 5.iso
/
src
/
X11
/
wais
/
doc
/
protspec.txt
< prev
next >
Wrap
Text File
|
1995-05-09
|
38KB
|
916 lines
WAIS Interface Protocol
Prototype Functional Specification
Version 1.5
April 23, 1990
Franklin Davis, Brewster Kahle, Harry Morris, Jim Salem, Tracy Shen
Thinking Machines Corporation
Rod Wang, John Sui, Mark Grinbaum
Dow Jones & Company, Inc.
Contents
1. Overview
1.1 Supported Facilities
1.2 Unsupported Facilities
1.3 Conformance with Version 1 of Z39.50
1.4 Errors in the Standard
2. Initialization Facility
2.1 Init APDU
2.2 Init-Response APDU
3. Search Facility
3.1 Search APDU
3.2 Search-Response APDU
4. Element-Set-Names supported by DowQuest
4.1 Document-Header-Request
4.2 Document-Text-Request
4.3 Document-Header
4.4 Document-Text
4.5 Document-Short-Header
4.6 Document-Headline
4.7 Document-Long-Header
4.8 Document-Codes
5. Data Element Definitions
5.1 Tag Values of the Data Element
Appendix A. Type-3 Query (Relevance Feedback)
Appendix B. Sample APDUs in WAIS Demonstration System
B.1 Init APDU
B.2 Init-Response APDU
B.3 Search APDU
B.4 Search-Response APDU
Appendix C. DowQuest Code Formats
1. Overview
The purpose of this interface is to establish an application level
(ISO 2) protocol for query/retrieval applications. The initial
implementation will provide a protocol for the DowQuest database
service provided by Dow Jones News Retrieval. Workstation interfaces
will be implemented on the Macintosh as part of the WAIS project (Wide
Area Information Server). The intention is to provide a sophisticated
and expandable computer-to-computer interface for future databases.
This protocol is based on the Z39.50-1988 ("the standard") Information
Retrieval Service Definitions and Protocol Specification for Library
Applications. Each section of this document includes references in
square brackets "[]" to the appropriate section(s) in the Z39.50
specification.
The standard specifies an Opens Systems Interconnection application
layer service definition and protocol specification for Information
Retrieval. The Information Retrieval protocol allows an application
on one computer to query the database of another computer. The
protocol specifies the procedures and structures for the intersystem
submission of a search request (including the syntax of the query),
request for the transmission of database records located by a search,
the responses to the request, access control, and resource control.
This is the last version of the WAIS protocol to be based on the
Z39.50 standard. The next version will implement the newer SR-1
standard, which is based on Z39.50, but is written in ASN.1.
The WAIS extensions to the standard are primarily to support
"relevance feedback" queries. (The standard currently supports a
boolean query syntax.) The Present facility is not used, in order to
allow the target system to be "stateless" (to always delete Result-
Sets.) Instead, a Type-1 query is used for text retrieval. In order
to retrieve document number xxx, a search is performed with a query
specifying that System-Control-Number=xxx.
The WAIS extensions also enable the origin to request a range of
document text. The Type-1 query is used as described in the previous
paragraph with the addition of Chunk-Code parameters. The portion of
the document that matches the Chunk-Code values will be returned, e.g.
"System-Control-Number=xxx AND Line>1000 AND Line <= 2000" would
return lines 1001 through 2000 of document xxx.
This protocol requires the target system to return unique document IDs
in a Search-Response, labeled as System-Control-Number (see Appendix C
of the standard). These document IDs are used by the origin (user
interface) to specify documents when requesting display of a document
or in relevance feedback searches.
Retrieval of large documents dependsw on the ability to specify a
range of a document in a search. This will be specified with an
extension called "Chunks." This version of the protocol does not have
a method for the origin and target to negotiate the available chunk
types. Three chunk types are currently defined for DowQuest: Byte,
Line, and Paragraph.
For efficiency reasons it is useful to refer to a document range with
large "chunks" that have been marked in the text by the target system.
The chunk markers and IDs are not displayed to the user, but are used
by the origin when the user selects a range of a document for a
relevance feedback query. The Init-Response APDU is extended to
provide "chunk" markers and sizes which may be used to specify
document ranges in relevance feedback queries.
The User Information part of APDUs is used in more complex ways in
this extension than was originally envisioned in the standard. In the
standard, the User Information part was a single Element of type
"any." The WAIS protocol extensions uses User-Information-Field
preceding the set of elements in the user information part of an APDU.
This is the length in bytes of all the following elements, excluding
the User-Information-Length element.
1.1 Supported Facilities
For the June 1990 target delivery date of the prototype WAIS system,
DowQuest will support only 2 facilities from the Z39.50 specification.
The "Initialization Facility" [3.2.1] includes an "Init APDU"
[4.1.1.1, table A2] and an "Init-Response APDU" [4.1.1.2, Table A3].
The "Search Facility" [3.2.2] includes a "Search APDU" [4.1.1.3, table
A4] and a "Search-Response APDU" [4.1.1.4, table A5].
"APDU" means "Application Protocol Data Unit," which is a unit of data
passed between an origin (user workstation) and target (database
server). These and other terms are defined in section 2 of the Z39.50
specification.
The Search APDU will be extended to have a new query type: Type-3,
"Relevance Feedback Query."
The Search-Response APDU will be modified to include new elements in
Database-Records, including Document-IDs (used for relevance feedback)
and other fields, specified in section 4 of this document.
1.2 Unsupported Facilities
The remaining 5 facilities from Z39.50 are not supported in the WAIS
prototype.
The "Retrieval Facility" will not be supported in the Wais prototype.
Document text will be retrieved using a Type-1 query based on
System-Control-Number (document ID).
The "Result-Set-Delete Facility" is not needed because DowQuest will
always delete all Result-Sets after returning a Search-Response APDU.
The "Access Control Facility" will not be supported. All users will
have access to all data in DowQuest.
The "Accounting/Resource Control Facility" will not be supported.
DowQuest responses have a maximum size.
The "Termination Facility" is not needed because DowQuest will not
store any state about user sessions. Each request and response will
be a complete transaction, independent of all others. Either the
origin or the target may abort a session at any time.
1.3 Conformance with Version 1 of Z39.50
1.3.1 Extensibility
As specified in section 4.3 of the standard, WAIS systems will ignore
unknown data elements and options in received Init APDUs.
1.3.2 Static Requirements
The DowQuest system will conform to the Static Requirements specified
in section 4.4.1 of the standard, with extensions noted in this
document, except that it will NOT support general boolean Type-1
queries. The Type-1 query will be used only for retrieval of
documents based on System-Control-Number and Chunks.
1.3.3 Dynamic Requirements
WAIS systems will conform to the Dynamic Requirements specified in
section 4.4.2 of the standard. There are restrictions on the Type-1
Query.
1.3.4 Statement Requirements
DowQuest will be capable of acting in the role of target. It supports
version 1 of the standard.
See section 1.2 of this document for unsupported facilities.
Result-Sets will always be unilaterally deleted by DowQuest. It will
not accept Search APDUs specifying named result sets. Each input and
response message pair is a complete, independent transaction. Thus,
multiple users may share a single session, although the order of
responses is not guaranteed to be the same order as the requests. If
multiple users share a connection, the origin must use Reference-IDs
to identify input/response message pairs.
DowQuest supports element set names in Search APDUs as specified in
section 4 of this document.
The maximum number of database names that may be specified in a Search
APDU will be determined by the implementors.
1.4 Errors in the Standard
Table A7 on p. 43 of the standard is a copy of table A6. Table A7
should contain the fields defined in 4.1.1.6, p. 23. Earlier
versions of the WAIS protocol specification contained the same error
in table B.6.
2. Initialization Facility
DowQuest will accept an Init APDU at any time, and will always respond
with an Init-Response APDU. Since DowQuest is stateless, the
Initialization facility is not required to begin a user session, but
it may be used anytime to get the system parameters.
The Init-Response APDU may specify "chunk" parameters that may be used
to specify a range of a document in a relevance feedback Type-3 Query.
[??? The chunk negotiation needs to be defined more completely.]
The Init-Response APDU may also specify newline characters,
non-displayable field markers, and highlight/non-highlight markers,
and fields describing how often the target is updated and when the
target is updated.
2.1 Init APDU
The Init APDU requests information about the database service [3.2.1,
4.1.1.1, and Table A2]. Since DowQuest is stateless, Init is not
required to begin a user session.
The Options field must always have 0="will not use" for the Delete
facility.
See Appendix B.1 of this document for an example Init APDU.
2.2 Init-Response APDU
The Init-Response APDU provides information about the database service
[3.2.1, 4.1.1.2, and Table A3].
The Options field will always have 0="will not support" for the
access-control and resource-control facilities.
Implementation-Name will be "DowQuest", and the Implementation-Version
will be set by the implementors, to be updated as new versions are
released.
Preferred-Message-Size and Maximum-Record-Size will be determined
during the implementation.
See Appendix B.2 of this document for an example Init-Response APDU.
2.2.1 Chunk IDs
The User-Information-Field of the Init-Response APDU will contain
four elements indicating ways the origin may specify a region of a
document to be used in a relevance feedback Type-3 query. The region
is composed of a range of "chunks" such as bytes or paragraphs. The
elements are:
Search-Chunk-Code-Bitmap O bitmap
Present-Chunk-Code-Bitmap [???] O bitmap
Chunk-ID-Length C integer
Chunk-Marker C ASCII
Search-Chunk-Code-Bitmap specifies the chunk codes the target will
accept in Type-1 Queries in Search APDUS requesting display of
document regions. The bitmap indicates with a "1" in a bit position
that the corresponding code number will be accepted by the target
system. For example, to indicate that the target accepts accepts
Chunk-Codes 1 and 3 in a Search APDU it would return
Search-Chunk-Code-Bitmap with bits 1 and three set to 1 and all other
bits 0.
Initially, four Chunk-Codes are defined. The default is 1 "Byte" (see
section 5 of this document):
Chunk-Code=0 "Document"
Chunk-Code=1 "Byte"
Chunk-Code=2 "Line"
Chunk-Code=3 "Paragraph"
(In the future this may be extended to include other measures, such as
Word, Page, or Chapter-ID. Other media such as audio might use chunks
such as Song-ID or Seconds. Video might use Frame or Scene-ID.)
Chunk-Code=1 "Byte" is the most general case. With this chunk size,
Chunk-Marker and Chunk-ID-Length are not used. The origin may
indicate ranges of a document in bytes by setting Chunk-Code=1 and
providing pairs of byte-offsets in a relevance feedback Type-3 query.
If any Chunk-Code > 1 is accepted, the target must also provide
Chunk-ID-Length and Chunk-Marker.
DowQuest will provide Chunk-Code=3 (Paragraph-ID) for relevance
feedback Type-3 Queries, and Chunk-Code=2 (Line) for text retrieval
Type-1 Queries.
[??? Need more general chunk mechanism for both tagged and counted
types, e.g. paragraphs are tagged, but lines are counted (each line is
"tagged" only by the presence of a newline). This will be addressed
in the next version of the protocol.]
2.2.2 Other Markers
DowQuest will also provide elements in the User-Information field of
the Init-Response APDU indicating various non-displayable marker
fields. These include:
Highlight-Marker O ASCII
De-Highlight-Marker C ASCII
Newline-Characters O ASCII
If Highlight-Marker is present, De-Highlight-Marker is required.
2.2.3 Other Information Elements
WAIS targets may provide elements describing how often and when the
database is updated:
Update-Frequency O [???]
Update-Times O [???]
[??? pricing info?] O [???]
[The format and tags of these fields is TBD.]
3. Search Facility
3.1 Search APDU
The Search APDU will be implemented as defined in the standard [3.2.2,
4.1.1.3, and Table A4]. However, the Result-Set will always be
deleted by DowQuest immediately after returning a Search-Response
APDU, so the Replace-Indicator field in the Search APDU should be
"on," an and Result-Set-Names is not used. Search APDUs may not refer
to a Result-Set. This enables DowQuest to be stateless.
The Type-3 Relevance Feedback Query syntax is outside the scope of the
standard. The syntax used by DowQuest is given in Appendix A.
DowQuest will support the Type-1 Query syntax, but not for general
boolean queries. Only searches specifying System-Control-Number (and
possibly Chunk ranges) are supported.
See Appendix B.3 of this document for an example Search APDU.
3.2 Search-Response APDU
The Search-Response APDU is almost the same as specified in the
standard [3.2.2, 4.1.1.4, and table A5], with a new type of
Database/Diagnostic-Record. The elements used in Database-Records
[3.2.2.1.5, A.1.3.1] are specified in section 4 of this document.
The Result-Set will always be deleted by the DowQuest immediately
after sending a Search-Response APDU.
The default element set returned in each Database-Record by DowQuest
in a Search-Response APDU is "Document-Header," defined in section 5
of this document.
For records that are beyond the Medium-Set-Present-Number in the
Search APDU, DowQuest will return the "Document-Short-Header" element
set. This will probably not happen in normal circumstances since
DowQuest returns a maximum of 16 documents. The origin can request
the Date/Score/Headline/etc. elements by requesting a Document-
Headline element set in subsequent Search APDUs. [??? Perhaps we
should use message-length or buffer sizes to control this, instead?]
See Appendix B.4 for an example Search-Response APDU.
4. Element Sets supported by DowQuest
The elements supported by a particular target are outside the Z39.50
standard [3.2.2.1.3]. DowQuest will support the following
Element-Set-Names. These are used in Search and Search-Response
APDUs. Element-Set-Names is an optional field in Search APDUs [Table
2, Table 3].
Elements marked with a "*" can only appear in a Search-Response APDU,
since the information is deleted with the Result-Set, so is no longer
available when requesting text, i.e. the text headline and code
elements should only be used with Type-1 queries.
The second column notes whether an element is Required, Optional, or
Conditional in a given APDU.
The elements and their tag values are defined in section 5 of this
document.
4.3 Document-Header
A Search-Response APDU contains one variable element:
Seed-Words-Used O ASCII
The rest of this element set is returned by default for each
Database-Record in a Search-Response APDU:
System-Control-Number R ANY
Version-Number O integer
Score * O integer
Best-Match * O integer
[???] Lines O integer
Document-Length O integer
Source O ASCII
Date O ASCII
Title C ASCII
Geographic-Name O ASCII
4.4 Document-Text
This element set may be returned for each Database-Record in a
Search-Response APDU in response to a Type-1 query:
Document-ID R ANY
Version-Number O integer
Document-Text R ASCII
4.5 Document-Short-Header
This element set is returned in the Database-Record in a
Search-Response APDU for documents that are beyond the
Medium-Set-Present-Number:
Document-ID R ANY
Version-Number O integer
Score * O integer
Best-Match * O integer
Document-Length R integer
4.6 Document-Headline
This element set is returned in a Search-Response APDU when requested
in a Type-1 Query in a Search APDU for documents that were previously
returned with Document-Short-Header element sets because of size
restrictions:
Document-ID R ANY
Version-Number O integer
Source O ASCII
Date O ASCII
Headline R ASCII
Origin O ASCII
4.7 Document-Long-Header
This element set may be optionally requested in a Search APDU to be
returned in a Search-Response APDU:
Document-ID R ANY
Version-Number O integer
Score * O integer
Best-Match * O integer
Document-Length R integer
Source O ASCII
Date O ASCII
Headline R ASCII
Origin O ASCII
Stock-Codes O ASCII
Company-Codes O ASCII
Industry-Codes O ASCII
[??? what about more general codes, e.g. author, pricing,
copyright?]
4.8 Document-Codes
This element set is returned in a Search-Response APDU when requested
in a Search APDU:
Document-ID R ANY
Version-Number O integer
Stock-Codes O ASCII
Company-Codes O ASCII
Industry-Codes O ASCII
6. Data Element Definitions
Begin-Date-Range is the latest date for finding documents in a query
where Date-Factor is DF_LATER or DF_SPECIFIED_RANGE. Dates are ASCII,
of the form yyyymmdd.
Best-Match is the approximate byte offset within a document of the
highest-scoring portion of the document.
Chunk-Code specifies the size of chunks used in document regions. The
default value is 1. In DowQUest two Chunk-Codes are supported:
DowQuest will provide Chunk-Code=3 (Paragraph-ID) for relevance
feedback Type-3 Queries in a Search APDU, and Chunk-Code=2 (Line) for
text retrieval Type-1 Queries in a Search APDU. Chunk-Code=1 (Byte)
is the most general case. With this chunk size, Chunk-Marker and
Chunk-ID-Length are not used. The origin may indicate ranges of a
document in bytes by setting Chunk-Code=1 and providing pairs of
byte-offsets in a relevance feedback Type-3 query. Otherwise, the
origin indicates chunk ranges by specifying Chunk-Start-ID and
Chunk-End-ID.
Chunk-End-ID -- see Chunk-Start-ID.
Chunk-ID-Length specifies how many bytes Chunk-IDs will be. In
DowQuest Chunk-ID-Length for paragraphs is 3 bytes. The contents of a
Chunk-ID is opaque to the origin system. The value is used unchanged
when specifying a chunk range in a relevance feedback Type-3 query.
Chunk-Marker specifies an ASCII byte sequence that will occur in the
document text as a delimiter for the start of a chunk (except
Chunk-Code=1 (Byte) which has no markers). In DowQuest Chunk-IDs for
paragraphs are preceded by "<ESC>l" which is a two-byte Chunk-Marker.
Chunk-Start-ID and Chunk-End-ID are either Chunk-IDs (type ANY) that
were each marked with a Chunk-Marker in the text of a document
returned in a Search-Response APDU; or, if Chunk-Code=1, they are
integers containing byte offsets in the text of the document. They
delimit the beginning and end of a user-selected relevant region of
the document to be used for a relevance feedback query.
Company-Codes contains ASCII codes describing companies that are
mentioned in a document.
Date is the ascii date a document was published (yyyymmdd).
Date-Factor is one of: 1 "DF_INDEPENDENT", 2 "DF_LATER", 3
"DF_EARLIER", or 4 "DF_SPECIFIED_RANGE". The default is
Date-Factor=1, which specifies no special weighting of dates. The
other 3 values specify bonus scoring for documents with dates greater,
less than, or between specified dates, respectively. Date-Factor=2
uses Begin-Date-Range, Date-Factor=3 uses End-Date-Range, and
Date-Factor=4 uses both.
De-Highlight-Marker -- see Highlight-Marker.
Document-ID is a field that was previously returned in a
Search-Response APDU. It is unique in the database being searched.
It must be used in a Search APDU exactly as it was returned in a
Search-Response APDU. See Document-ID-Chunk.
Document-ID-Chunk is the same as a Document-ID element, except that it
must be followed by two or three chunk elements defining a fragment of
the document: Chunk-Code, Chunk-Start-ID, Chunk-End-ID. Chunk-Code is
optional; if Chunk-Code is missing, the previous value of Chunk-Code
in the current APDU is used; or if Chunk-Code never appeared in this
APDU, the default value is Chunk-Code=1 (Byte).
Document-Length is the length of the entire document in bytes.
Document-Text is a portion of a document text.
End-Date-Range is the earliest date for finding documents in a query
where Date-Factor is DF_EARLIER or DF_SPECIFIED_RANGE. Dates are ASCII,
of the form yyyymmdd.
Headline is a short ASCII description of the document for presentation
to the user. In DowQuest it is a maximum of 160 bytes [??? is this a
requirement?].
Highlight-Marker and De-Highlight-Marker are character sequences that
precede and follow text that may be displayed with highlighting. In
DowQuest, every searchable term is preceded by "<DC1>" (0x11) and
followed by "<DC3>" (0x13).
Industry-Codes contains ASCII codes describing industries that are
mentioned in a document.
Max-Documents-Retrieved is the maximum number of documents requested
by the origin in a Search APDU to be returned in a Search-Response
APDU. In DowQuest the default value is 16 [??? probably should not
have a default value?]. The target may return less than
Max-Documents-Retrieved documents.
Newline-Characters indicates what characters are used at the end of
lines. In DowQuest this is "<CR>" (0x0D).
Origin-City is an ASCII name of the city and/or country where a
document originated.
Present-Chunk-Code-Bitmap is a bitmap indicating what Chunk-Codes may
be used in a Present APDU to specify a text range of a document to be
returned. See Search-Chunk-Code-Bitmap for its definition. [??? This
is obsolete. Chunk-Codes must be worked out more completely.]
Score is a measure of how well the document matched the query. It may
be any integer value. [??? We may need to define a valid score range
to be used by all targets, or add a field in the Init-Response APDU to
specify the range for the current target.]
Search-Chunk-Code-Bitmap is a bitmap indicating what Chunk-Codes may
be used in a Search APDU query to specify a range of a document. The
bitmap indicates with a "1" in a bit position that the corresponding
code number will be accepted by the target system. For example, to
indicate that the target accepts accepts Chunk-Codes 1 and 3 in a
Search APDU it would return Search-Chunk-Code-Bitmap with bits 1 and
three set to 1 and all other bits 0.
Seed-Words is a text string containing the initial seed words in a
relevance feedback Type-3 query.
Seed-Words-Used is the same format as Seed-Words except it contains
only words that actually matched some documents in the database. This
allows the user interface to give the user feedback about which seed
words were effective in a query.
Source is an ASCII string identifying the original source of a
document (e.g. newspaper name, journal title, etc.)
Stock-Codes contains ASCII stock ticker codes for companies that are
mentioned in a document.
Text-List is a list of text strings that are provided by the user.
They are document fragments that come from outside the DowQuest
database which the user wants to use in a search. They are processed
in the same manner as seed words except they are not given seed word
weight bonuses. **This would be a new feature of a query within
DowQuest, and would require changes to the Query Server as well as the
User Server portion of DowQuest. It will not be implemented for the
June '90 prototype.
User-Information-Length is the length of the entire user information
part of an APDU when it consists of more than one element.
User-Information-Length does not include itself in the length.
Version-Number is used to validate a local copy of a document's text.
If a document is modified in the target server, its Version-Number
must be incremented. If a document may not be cached, Version-Number
is set to 0. The default value is 0.
5.1 Tag Values of the Data Element
This table is an extension to the table 19 in section 4.1.3 of the
standard.
Element Tag PDU R/O/C
_____________________________________________________________
User-Information-Length[???] 99 Init-Response C
Search C
Search-Response C
Chunk-Code 100 Search O
Chunk-ID-Length 101 Init-Response C
Chunk-Marker 102 Init-Response C
Highlight-Marker 103 Init-Response O
De-Highlight-Marker 104 Init-Response C
Newline-Characters 105 Init-Response O
Seed-Words 106 Search C
Document-ID-Chunk 107 Search O
Chunk-Start-ID 108 Search O
Chunk-End-ID 109 Search C
Text-List 110 Search O
Date-Factor 111 Search O
Begin-Date-Range 112 Search O
End-Date-Range 113 Search C
Max-Documents-Retrieved 114 Search R
Seed-Words-Used 115 Search-Response O
Document-ID 116 Search O
Search-Response R
Version-Number 117 Search-Response O
Score 118 Search-Response O
Best-Match 119 Search-Response O
Document-Length 120 Search-Response R
Source 121 Search-Response O
Date 122 Search-Response O
Headline 123 Search-Response C
Origin-City 124 Search-Response O
Search-Chunk-Code-Bitmap 125 Search O
Present-Chunk-Code-Bitmap [???] 126 Search O
Document-Text 127 Search-Response R
Stock-Codes 128 Search-Response O
Company-Codes 129 Search-Response O
Industry-Codes 130 Search-Response O
Appendix A. Type-3 Query (Relevance Feedback)
Query syntax is not part of the Z39.50 specification, but a Type-1
query is suggested in Appendix B of the standard for Boolean queries.
This is a similar suggestion for relevance feedback queries.
The Type-3 Query supports the relevance feedback style of database
query (as provided by DowQuest). The Type-3 query includes the
following elements:
Seed-Words R ASCII
Document-ID O ANY (see Note 1 below)
Document-ID-Chunk O ANY (see Note 2 below)
Chunk-Code O binary
Chunk-Start-ID C if Chunk-Code=1, binary
else ANY
Chunk-End-ID C if Chunk-Code=1, binary
else ANY
(may repeat Document-ID and Document-ID-Chunk elements)
Text-List O ASCII (Not in DowQuest)
Date-Factor O integer
Begin-Date-Range C ASCII
End-Date-Range C ASCII
Max-Documents-Retrieved R integer
Note 1: There may be any number of Document-ID and Document-ID-Chunk
elements in a Type-3 Query, intermixed.
Note 2: Each occurrence of a Document-ID-Chunk element must be
followed by two or three chunk elements, defining a fragment of the
document.
Appendix B. Sample APDUs in WAIS Demonstration System
In the following, binary values are shown in hexadecimal preceded by
0x. Variable fields include a tag and length [see A.1.2.1, A.1.2.2,
and Table 19]. See section 5.1 of this document for tag values for
WAIS elements.
B.1 Init APDU
[see Table 7, Table A2]
ITEM BYTE POS. VALUE NOTE
______________________________________________________________________
Header-Length-Indicator 1-2 0x0015 21
Header:
Fixed portion:
PDU-Type 3 0x14 20
Variable Portion:
Protocol-Version 4-6 0x030101 1
Options 7-9 0x0401C0 bit 1,2
Preferred-Message-Size 10-13 0x05020400 1024
Maximum-Record-Size 14-17 0x06020800 2048
Reference-ID 18-23 0x020400000001 1
User information part:
(none)
B.2 Init-Response APDU
[see Table 8, Table A3]
ITEM BYTE POSITION. VALUE NOTE
______________________________________________________________________
Header-Length-Indicator 1-2 0x0025 37
Header:
Fixed portion:
PDU-Type 3 0x15 21
Result 4 0x01 1="accept"
Variable Portion:
Protocol-Version 5-7 0x030101 1
Options 8-10 0x0401C0 bit 1,2
Preferred-Message-Size 11-14 0x05020400 1024
Maximum-Record-Size 15-18 0x06020400 1024
Implementation-Name 19-28 0x0908"DowQuest"
Implementation-Version 29-33 0x1003"1.0"
Reference-ID 34-39 0x020400000001 1
User-Information-Field 40-42 0x??0217 ??
Search-Chunk-Code-Bitmap 43-45 0x7D0140 bit 2
Present-Chunk-Code-Bitmap?? 46-48 0x7E0180 bit 1
Chunk-Id-Length 49-51 0x650103 3
Chunk-Marker 52-55 0x66021B6C "<ESC>l"
Highlight-Marker 56-58 0x670111 "<DC1>"
De-Highlight-Marker 59-61 0x680112 "<DC2>"
Newline-Characters 62-65 0x69020D0A "<CR><LF>"
B.3 Search APDU
[see Table 9, Table A4]
B.3.1 Example query containing only Seed-Words element (no
Document-ID):
ITEM BYTE POSITION. VALUE NOTE
______________________________________________________________________
Header-Length-Indicator 1-2 0x0018 24
Header:
Fixed portion:
PDU-Type 3 0x16 22
Small-Set-Upper-Bound 4-6 0x000400 1024
Large-Set-Lower-Bound 7-9 0x000800 2048
Medium-Set-Present-Number 10-12 0x000800 2048
Replace-Indicator 13 0x01 1="on"
Variable Portion:
Result-Set-Name 14-15 0x1100 ""
Database-Names 16-17 0x1200 ""
Query-Type 18-20 0x130133 "3"
Reference-ID 21-26 0x020400000002 2
User-Information-Field 27-29 0x??0224 36
Type-3 Query:
Seed-Words 30-62 0x6A1F"Tell me about
Thinking Machines"
Max-Documents-Retrieved 63-65 0x720110 16
[??? remove this field; use Small-Set-Upper-Bound or something...]
B.3.2 Example query containing Seed-Words, one Document-ID and
one Document-ID-Chunk element. This query includes seed word
"Apple," and specifies using all of document 00000001WJ in the
search, and paragraphs with IDS 005 through 007 from document
00000023WJ:
ITEM BYTE POSITION. VALUE NOTE
______________________________________________________________________
Header-Length-Indicator 1-2 0x0018 24
Header:
Fixed portion:
PDU-Type 3 0x16 22
Small-Set-Upper-Bound 4-6 0x000400 1024
Large-Set-Lower-Bound 7-9 0x000800 2048
Medium-Set-Present-Number 10-12 0x000800 2048
Replace-Indicator 13 0x01 1="on"
Variable Portion:
Result-Set-Name 14-15 0x1100 ""
Database-Names 16-17 0x1200 ""
Query-Type 18-20 0x130133 "3"
Reference-ID 21-26 0x020400000003 3
User-Information-Field 27-29 0x??0230 48
Type-3 Query:
Seed-Words 30-36 0x6A05"Apple"
Max-Documents-Retrieved 37-39 0x720110 16
[??? remove this field; use Small-Set-Upper-Bound or something...]
Document-ID 40-51 0x740A00000001WJ
Document-ID-Chunk 52-63 0x740A00000023WJ
Chunk-Code 64-66 0x640102 paragraph
Chunk-Start-ID 68-72 0x6C03"005" par ID=005
Chunk-End-ID 73-77 0x6D03"007" par ID=007
B.4 Search-Response APDU
[see Table 10, Table A5]
ITEM BYTE POSITION. VALUE NOTE
______________________________________________________________________
Header-Length-Indicator 1-2 0x0014 20
Header:
Fixed portion:
PDU-Type 3 0x17 23
Search-Status 4 0x00 0="success"
Result-Count 5-7 0x000002 2
Number-of-Records-Returned 8-10 0x000002 2
Next-Result-Set-Position 11-13 0x000000 0
Variable Portion:
Present-Status 14-16 0x1B0100 0="success"
Reference-ID 17-22 0x020400000002 2
User-Information-Field 23-25 0x??01DD 221
Seed-Words-Used 26-44 0x7311"Thinking Machines"
Database records:
Document-Header element set:
Document-ID 45-58 0x740C"0000000001WJ"
Version-Number 59-61 0x750100 0
Score 62-67 0x760400000022 34
Best-Match 68-77 0x77080000000000000001
Document-Length 78-87 0x78080000000000000033
Source 88-92 0x7903"WSJ"
Date 93-100 0x7A06"900601" yymmdd *
Headline 101-109 0x7B11"TMC Releases WAIS"
Origin-City 110-124 0x7C0D"Cambridge, MA"
Document-ID 125-138 0x740C"0000000123ZF"
Version-Number 139-141 0x750100 0
Score 142-147 0x760400000015 21
Best-Match 148-157 0x7708000000000000006E
Document-Length 158-167 0x78080000000000000121
Source 168-182 0x790D"Business Week"
Date 183-190 0x7A06"900603"
Headline 191-211 0x7B13"Apple Releases WAIS"
Origin-City 212-226 0x7C0D"Cupertino, CA"
(*) A Date element should actually be yyyymmdd
Appendix C. DowQuest Code Formats
C.1 Company Codes
[??? TBD]
C.2 Industry Codes
[??? TBD]
C.3 Stock Codes
[??? TBD]